Model-Free Monte Carlo-like Policy Evaluation
نویسندگان
چکیده
We propose an algorithm for estimating the finite-horizon expected return of a closed loop control policy from an a priori given (off-policy) sample of one-step transitions. It averages cumulated rewards along a set of “broken trajectories” made of one-step transitions selected from the sample on the basis of the control policy. Under some Lipschitz continuity assumptions on the system dynamics, reward function and control policy, we provide bounds on the bias and variance of the estimator that depend only on the Lipschitz constants, on the number of broken trajectories used in the estimator, and on the sparsity of the sample of one-step transitions.
منابع مشابه
Error Bounds in Reinforcement Learning Policy Evaluation
With the advent of Kearns & Singh’s (2000) rigorous upper bound on the error of temporal difference estimators, we derive the first rigorous error bound for the maximum likelihood policy evaluation method as well as deriving a Monte Carlo matrix inversion policy evaluation error bound. We provide, the first direct comparison between the error bounds of the maximum likelihood (ML), Monte Carlo m...
متن کاملLoss of Load Expectation Assessment in Deregulated Power Systems Using Monte Carlo Simulation and Intelligent Systems
Deregulation policy has caused some changes in the concepts of power systems reliability assessment and enhancement. In this paper, generation reliability is considered, and a method for its assessment using intelligent systems is proposed. Also, because of power market and generators’ forced outages stochastic behavior, Monte Carlo Simulation is used for reliability evaluation. Generation r...
متن کاملDevelopment and implementation of a Monte Carlo frame work for evaluation of patient specific out- of - field organ equivalent dose
Background: The aim of this study was to develop and implement a Monte Carlo framework for evaluation of patient specific out-of-field organ equivalent dose (OED). Materials and Methods: Dose calculations were performed using a Monte Carlo-based model of Oncor linac and tomographic phantoms. Monte Carlo simulations were performed using EGSnrc user codes. Dose measurements were performed using r...
متن کاملFactoring Exogenous State for Model-Free Monte Carlo
Policy analysts wish to visualize a range of policies for large simulator-defined Markov Decision Processes (MDPs). One visualization approach is to invoke the simulator to generate on-policy trajectories and then visualize those trajectories. When the simulator is expensive, this is not practical, and some method is required for generating trajectories for new policies without invoking the sim...
متن کاملMonte Carlo Study of the Effect of Backscatter Materail Thickness on 99mTc Source Response in Single Photon Emission Computed Tomography
Introduction SPECT projections are contaminated by scatter radiation, resulting in reduced image contrast and quantitative errors. Backscatter constitutes a major part of the scatter contamination in lower energy windows. The current study is an evaluation of the effect of backscatter material on FWHM and image quality investigated by Monte Carlo simulation. Materials and Methods SIMIND program...
متن کامل